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Abstract.  Research  spanning  decades  has  generated  a  long  list  of  phenomena 
associated  with  human  spatial  information  processing.  Additionally,  a  number 
of  theories  have  been  proposed  about  the  representation,  organization  and 
processing  of  spatial  information  by  humans.  This  paper  presents  a  broad 
account  of  human  spatial  competence,  integrated  with  the  ACT-R  cognitive 
architecture.  Using  a  cognitive  architecture  grounds  the  research  in  a  validated 
theory  of  human  cognition,  enhancing  the  plausibility  of  the  overall  account. 
This  work  posits  a  close  link  of  aspects  of  spatial  information  processing  to 
vision  and  motor  planning,  and  integrates  theoretical  perspectives  that  have 
been  proposed  over  the  history  of  research  in  this  area.  In  addition,  the  account 
is  supported  by  evidence  from  neuropsychological  investigations  of  human 
spatial  ability.  The  mechanisms  provide  a  means  of  accounting  for  a  broad 
range  of  phenomena  described  in  the  experimental  literature. 
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Frame  of  Reference,  Vision,  Representation,  Mechanism,  ACT-R. 


1  Introduction 

In  this  paper,  we  present  a  broad  theoretical  architecture  for  understanding  human 
spatial  competence.  Human  spatial  abilities  are  brought  to  bear  in  a  variety  of 
contexts,  and  in  a  variety  of  ways.  Spatial  information  processing  is  utilized  for 
navigation  and  wayfinding  [1],  [2],  map  reading  and  orientation  [3],  [4],  [5],  and 
spatial  transformations  like  mental  rotation  [6],  [7].  However,  spatial  abilities  are  also 
recruited  for  syllogistic  reasoning  tasks  [8],  problem  solving  [9],  [10],  and  language 
processing  [11],  [12].  This  flexibility  and  diversity  requires  that  an  account  of  human 
spatial  abilities  be  able  to  address  a  range  of  specific  abilities  within  the  context  of 
overall  cognitive  functioning. 

In  addition  to  breadth,  an  understanding  of  human  spatial  competence  requires  a  grasp 
of  the  details  of  the  mechanisms  involved  in  encoding,  processing,  and  using  spatial 
knowledge.  This  includes  questions  concerning  how  spatial  information  is  represented,  as 
well  as  the  mechanisms  that  are  available  for  manipulating  those  representations  [13], 
[14],  [15].  The  literature  contains  many  theories  that  address  various  aspects  of  spatial 
information  processing,  including  representations  of  environmental  information  [11], 
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[16],  [17],  visuospatial  working  memory  [18],  [19],  reasoning  with  spatial  mental  models 
[20],  [21],  mental  imagery  [14],  [22],  and  navigation  [23],  [24],  [25].  What  currently  does 
not  exist,  however,  is  an  integrated  theory  that  provides  an  account  of  human 
performance  across  different  domain  areas. 

The  theory  presented  in  this  paper  addresses  each  of  these  general  areas  of  human 
spatial  competence,  to  provide  broad  coverage  on  how  humans  encode,  store,  and  use 
spatially-based  information  to  perform  a  variety  of  tasks  in  different  domains. 
Because  of  the  scope  of  the  challenge,  we  have  tried  to  strike  a  balance  between 
presenting  the  breadth  of  the  theory,  while  describing  the  components  in  sufficient 
detail  to  permit  a  thorough  evaluation.  We  have  grounded  the  account  in  the  ACT-R 
cognitive  architecture  [26],  which  provides  a  well- validated  theory  of  overall  human 
information  processing.  We  do  this  to  connect  our  work  to  a  more  general  theory  of 
human  cognition.  This  provides  us  with  important  constraints  on  our  account  and 
allows  us  to  focus  more  specifically  on  mechanisms  for  spatial  information 
processing,  since  the  existing  ACT-R  architecture  provides  validated  mechanisms  for 
other  critical  components  of  the  human  cognitive  system.  Although  the  mechanisms 
we  propose  are  not  implemented  yet,  they  are  specified  in  enough  detail  to  identify 
accounts  for  various  phenomena,  some  of  which  are  described  briefly  in  the 
remainder  of  this  chapter.  To  begin,  we  address  several  important  issues  in  the  realm 
of  spatial  competence  in  the  next  several  subsections.  Dealing  with  critical  concepts 
from  the  literature  at  the  outset  hopefully  will  clarify  our  approach  and  simplify  the 
discussion  of  other  points  in  the  remainder  of  the  paper. 

1.1  The  Cognitive  Map 

Tolman’s  seminal  article,  “Cognitive  Maps  in  Rats  and  Men”  [27],  is  generally 
associated  with  the  origin  of  modern  research  into  spatial  information  processing. 
Since  then,  the  term  cognitive  map  has  played  a  central  role  in  theorizing  about 
human  spatial  abilities.  Many  theories  have  been  developed  that  claim  humans 
automatically  generate  an  exocentric  cognitive  map  of  the  environment  based  upon 
experience  in  a  space  (c.f.  [28],  [29],  [30]).  Proponents  of  these  theories  have  pointed 
to  the  discovery  of  place  cells  in  the  rat  [31]  and  human  [32]  hippocampus  as  key 
evidence  for  this  view.  The  alternative  that  is  most  commonly  offered  is  egocentric 
encoding  of  spatial  information,  where  the  locations  of  items  in  the  environment  are 
encoded  with  respect  to  the  coordinate  system  defined  by  the  location  and  orientation 
of  the  viewer  (e.g.,  [11],  [33]). 

Evidence  has  accumulated  on  both  sides  of  this  debate  (e.g.,  [34],  [35],  [36],  [37]). 
However,  we  find  the  evidence  arguing  against  the  exocentric  cognitive  maps  as  the 
default  representational  format  for  human  spatial  representations  to  be  compelling. 
This  is  not  to  say  that  humans  can  not  or  do  not  sometimes  represent  space  using 
exocentric  reference  frames.  Rather,  our  claim  is  that  humans  do  not  automatically 
construct  a  cognitive  map'"  of  the  environment  based  on  visual  perception.  Instead,  we 
believe  that  spatial  information  is  encoded  in  a  fragmented  manner  by  default,  using 


*  We  use  the  term  ‘cognitive  map’  to  refer  to  the  notion  of  an  internal,  exocentric  representation 
of  space  that  is  akin  to  a  paper-based  map.  While  the  term  initially  held  a  much  broader 
connotation,  this  has  been  largely  lost  in  current  usage. 
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multiple  coordinate  systems  to  represent  spatial  locations.  Initially,  early  vision 
utilizes  a  retinotopic  coordinate  system,  which  can  be  used  for  guiding  and  directing 
visual  attention  [38].  We  propose  that  the  perceptual  system  generates  two  enduring, 
high-level  encodings  of  spatial  location  from  visual  input,  one  based  on  the  egocentric 
frame  of  reference  (distance  &  bearing  from  self),  and  one  based  on  a  frame  of 
reference  defined  by  salient  features  of  the  environment  (e.g.,  the  boundaries  of  a 
room  or  a  prominent  landmark).  The  evidence  for  these  representations  comes  from 
functional  considerations,  described  next,  and  findings  from  neuropsychological 
research  (Section  4). 

Importantly,  egocentric  and  exocentric  frames  of  reference  support  different 
functions  within  the  system  (e.g.,  [33]).  Encoding  location  with  respect  to  an 
egocentric  frame  of  reference  facilitates  acting  on  objects  in  the  world  ([17],  [39], 
[40]).  To  interact  with  an  object,  it  is  critical  to  have  knowledge  of  the  relationship 
between  oneself  and  the  object.  In  addition,  this  representation  of  location  is  a 
primitive  in  visual  perception,  where  perceived  distance  and  bearing  of  an  object  can 
be  inferred  directly  from  the  visual  stimulus  [33].  In  contrast,  location  information 
based  upon  an  exocentric  frame  of  reference  is  important  for  grounding  spatial 
information  in  the  environment  and  for  computing  spatial  relations.  For  these  tasks,  it 
is  necessary  that  locational  information  be  represented  within  a  common  coordinate 
system.  The  egocentric  reference  frame  is  not  appropriate  for  such  tasks,  since  any 
movement  or  rotation  by  the  viewer  produces  a  change  to  the  coordinate  system  [33]. 
Thus,  location  information  based  on  an  exocentric  reference  frame  is  needed  to  link 
locational  information  for  multiple  objects  for  making  spatial  judgments.  Spatial 
processes,  in  conjunction  with  imagery,  can  be  applied  to  generate  more  complex 
representations  for  multiple  objects  from  these  elements  as  well  (e.g.,  a  cognitive 
map).  However,  this  is  an  effortful  process  that  inherits  the  error  and  bias  that  is 
associated  with  human  visual  perception,  not  an  automatic,  unconscious  process 
providing  an  integrated  representation  of  the  environment. 

1.2  Hierarchical  Encoding 

There  is  substantial  evidence  for  a  hierarchical  component  to  spatial  information 
processing  (e.g.,  [41],  [42],  [43]),  and  any  serious  theory  of  human  spatial 
competence  needs  to  account  for  these  findings.  In  our  account,  hierarchical 
phenomena  arise  as  a  consequence  of  the  frames  of  reference  used  for  visual 
encoding.  A  frame  of  reference  is  used  to  encode  visual  information,  based  upon  the 
contents  of  the  visual  experience.  To  take  a  famous  example  from  Stevens  &  Coupe 
[43],  when  studying  a  map  of  the  United  States,  San  Diego  will  tend  to  be  encoded 
with  respect  to  the  state  of  California,  and  Reno  will  tend  to  be  encoded  with  respect 
to  the  state  of  Nevada.  To  compare  the  relative  locations  of  these  two  cities,  however, 
requires  that  they  be  positioned  within  the  same  frame  of  reference.  In  this  case,  it  is 
necessary  to  shift  to  the  United  States  as  the  frame  of  reference.  The  relative  spatial 
locations  of  the  two  states  within  the  United  States  will  lead  to  the  typical  error  (i.e., 
believing  that  Reno  is  farther  east  than  San  Diego,  when  it  is  actually  farther  west). 

We  are  unable  to  provide  a  full  discussion  of  the  mechanisms  that  would  support 
these  operations  in  this  paper.  However,  the  key  point  with  regard  to  hierarchical 
encoding  is  that  each  item  encoded  by  the  system  is  represented  within  an  exocentric 
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reference  frame  based  upon  local,  salient  features  of  the  environment.  Hierarchical 
phenomena  arise  because  that  reference  frame  is  then  represented  as  an  item  in  a 
larger  reference  frame.  Thus,  San  Diego  (the  item)  is  positioned  at  a  particular 
location  within  the  state  of  California  (the  reference  frame).  However,  California 
occupies  a  particular  location  within  the  United  States.  Our  assumption  that  spatial 
comparisons  must  be  carried  out  within  the  same  reference  frame  provides  the 
explanation  for  why  various  hierarchical  phenomena  are  found  in  spatial  tasks. 
Mentally  re-encoding  location  relative  to  a  new  reference  frame  takes  time  and  results 
in  increased  error  and  bias. 

1.3  The  Imagery  Debate 

Finally,  mental  imagery  has  generated  a  substantial  amount  of  research  and  theorizing 
throughout  the  history  of  psychology  [13],  [14],  [15],  [22],  [44],  [45].  A  major  issue 
under  debate  has  been  whether  visual  mental  images  are  depictive.  That  is,  do  mental 
images  have  a  spatial  extent  (in  the  brain)  that  preserves  the  spatial  properties  of  the 
original  stimulus?  More  generally,  the  question  concerns  an  issue  of  whether  mental 
images  are  encoded  in  a  format  that  is  distinct  from  other  kinds  of  information  stored 
in  the  hrain. 

To  resolve  this  issue,  we  look  to  the  representations  and  mechanisms  in  the  ACT-R 
architecture.  ACT-R  posits  a  number  of  processing  modules,  which  are  responsible 
for  different  aspects  of  cognition.  In  the  architecture,  there  is  a  vision  module,  which 
is  specialized  for  processing  visual  perceptual  information.  We  agree  with  Kosslyn 
and  others  that  mental  imagery  utilizes  many  of  the  same  cortical  areas  and  neural 
pathways  as  vision  [22],  [46].  Consequently,  our  theory  tightly  couples  mechanisms 
for  mental  imagery  with  existing  architectural  mechanisms  for  visual  perception.  The 
result  is  that  vision  and  mental  imagery  operate  on  the  same  representations,  which 
are  different  from  other  information  in  declarative  memory.  It  is  interesting  to  note, 
however,  that  this  distinction  is  based  largely  on  content.  All  declarative  knowledge 
in  ACT-R,  including  visual  chunks,  is  represented  propositionally.  Thus,  while  visual 
knowledge  is  distinct,  the  representation  is  not  necessarily  qualitatively  different  from 
other  knowledge  in  memory.  This  speaks  to  the  more  detailed  issue  of  whether  visual 
mental  images  are  depictive  in  a  real  sense.  One  reason  for  propositional 
representations  of  visual  information  in  ACT-R  is  the  architecture’s  relatively  abstract 
and  lean  representation  of  visual  information.  However,  it  is  also  the  case  that 
propositional  representations  are  more  in  line  with  the  existing  architecture.  To  the 
extent  possible,  we  are  working  within  the  overall  structure  of  the  architecture,  until 
evidence  arises  that  forces  us  to  rethink  some  of  these  assumptions.  For  now,  we 
believe  that  the  representation  of  visual  information  currently  instantiated  in  ACT-R 
provides  an  adequate  foundation  that  supports  the  additional  representational 
components  and  mechanisms  we  intend  to  implement. 


2  Unified  Theories  of  Cognition  and  ACT-R 

Cognitive  architectures,  like  ACT-R,  EPIC,  and  Soar,  instantiate  a  theory  of  the 
human  information  processing  system  in  its  entirety.  These  unified  theories  of 
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cognition  [47]  contain  mechanisms  to  account  for  various  aspects  of  human  cognitive 
functioning,  including  problem  solving,  perception,  and  motor  actions  [26],  [47],  [48]. 
One  of  the  challenges  associated  with  developing  a  cognitive  architecture  is 
identifying  an  appropriate  set  of  mechanisms,  which  are  not  only  capable  of 
producing  solutions  to  a  broad  range  of  tasks  faced  by  humans,  but  which  solve  those 
tasks  in  a  psychologically  plausible  manner.  Because  of  the  prevalence  of  spatial 
information  processing  in  human  cognition  and  performance,  it  is  critical  to 
incorporate  mechanisms  for  spatial  processing  in  these  theories,  particularly  as 
cognitive  architectures  are  applied  to  increasingly  complex,  spatially  rich  tasks.  In 
addition,  however,  it  is  vital  that  theories  of  spatial  competence  take  seriously  the 
constraints  imposed  by  other  components  of  the  human  cognitive  system,  many  of 
which  have  been  implemented  in  cognitive  architectures.  Human  perception  and 
action  is  constrained  in  ways  that  can  significantly  influence  performance  on  spatial 
tasks.  In  addition,  human  cognitive  limitations,  like  working  memory  capacity  and 
long-term  memory  decay  moderate  how  spatial  information  is  processed  and 
remembered.  In  short,  theories  of  human  cognition  cannot  ignore  spatial  information 
processing,  just  as  theories  of  spatial  competence  must  take  into  account  other 
perceptual,  cognitive,  and  motor  mechanisms. 

For  the  most  part,  unfortunately,  these  research  communities  have  remained 
disconnected.  Our  intent  is  to  incorporate  what  is  known  about  human  spatial 
competence  into  a  cognitive  architecture  to  facilitate  developing  more  precise,  and 
psychologically  valid,  quantitative  accounts  of  human  performance  on  complex, 
spatially-demanding  tasks.  Researchers  in  the  area  of  spatial  cognition  have 
developed  a  variety  of  theories  to  account  for  human  performance  in  different  spatial 
information  processing  domains  (e.g.,  [19],  [20],  [22]).  These  theories  capture 
important  capacities  and  limitations  of  human  spatial  ability.  However,  they  are 
often  not  implemented.  And,  when  they  are,  they  are  typically  not  implemented  as 
part  of  a  more  comprehensive  theory  of  human  cognition  (e.g.,  [21],  [49],  [50]).  In 
the  remainder  of  this  paper,  we  describe  our  proposal  for  linking  the  insights  of 
this  research  to  a  sophisticated,  yet  general,  computational  theory  of  the  human 
information  processing  architecture. 

2.1  ACT-R 

A  full  description  of  the  ACT-R  architecture  is  beyond  the  scope  of  this  chapter. 
Thus,  only  a  brief  sketch  is  given  here.  More  detailed  descriptions  can  be  found 
elsewhere  (e.g.,  [26],  [51]).  ACT-R  is  a  cognitive  architecture  with  a  set  of  core 
mechanisms  that  has  been  used  to  provide  accounts  of  human  performance  across  a 
broad  range  of  research  domains  (see  [51]  for  a  review).  At  the  highest  level,  ACT-R 
is  a  serial  production  system  where  productions  (condition-action  pairs)  are  matched 
against  the  current  state  of  the  system.  On  each  cycle,  a  single  production  is  selected 
and  executed  (fired),  which  produces  a  change  in  the  state  of  the  system,  and  the  cycle 
begins  again.  The  current  state  in  ACT-R  is  defined  by  the  contents  of  a  set  of  buffers. 
Each  buffer  is  associated  with  a  specialized  processing  module,  and  serves  as  the 
interface  between  the  module  and  the  production  system.  We  mentioned  the  vision 
module  above,  which  has  a  buffer  to  represent  object  properties  {what),  and  a  second 
buffer  to  represent  location  information  {where).  There  is  also  a  declarative  memory 
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module  with  a  retrieval  buffer,  which  is  specialized  for  storing  and  processing 
declarative  knowledge  (facts  and  information  stored  as  chunks).  Each  buffer  may  hold 
only  a  single  chunk  at  any  given  time,  and  each  module  can  process  only  a  single 
request  at  a  time.  Thus,  modules  and  buffers  are  serial  as  well.  Parallelism  exists  in 
ACT-R  through  the  simultaneous  operation  of  all  of  the  modules.  Subsymbolic 
mechanisms  are  implemented  within  the  modules  and  produce  a  graded  quality  in 
cognitive  processes.  The  speed  and  accuracy  of  operations  are  impacted  by 
continuously  varying  quantities,  like  activation  for  declarative  knowledge  and  utility 
values  for  productions. 


Fig.  1.  Schematic  illustration  of  the  current  ACT-R  architecture,  with  proposed  additions 
included.  Structures  identified  in  white  represent  existing  components  of  the  architecture.  Grey 
components  represent  proposed  additions.  The  environment  is  indicated  in  hlack. 


The  modules  generally  are  driven  by  requests  from  the  production  system.  For 
instance,  a  production  may  request  a  shift  of  visual  attention.  The  module  processes 
the  request  and  returns  the  result  to  the  buffer,  where  it  can  be  accessed  by  the 
production  system.  In  the  case  of  shifting  attention,  the  vision  module  plans  and 
executes  the  action,  and  a  chunk  representing  the  item  being  attended  is  placed  into 
the  visual-object  buffer.  Figure  1  illustrates  the  major  components  of  the  current 
ACT-R  architecture,  along  with  the  additions  that  are  proposed  in  this  paper.  The 
current  version  of  ACT-R  (ACT-R  6.0)  has  been  designed  and  implemented  to 
support  adding,  modifying,  or  deleting  components,  out  of  an  appreciation  of  the 
limitations  of  the  current  architecture  and  interest  in  having  research  explore 
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alternative  accounts  of  cognitive  phenomena  [26].  This  makes  ACT-R  well-suited  for 
exploring  how  to  account  for  human  spatial  competence  in  the  context  of  a  unified 
theory  of  cognition.  In  the  next  section,  we  describe  our  suggested  modifications  and 
additions,  including  how  they  integrate  and  interact  with  the  existing  architecture. 


3  An  Architectural  View  of  Spatial  Competence 

Our  account  of  spatial  competence  in  ACT-R  consists  of  proposals  to  add  a  module 
and  several  buffers  to  the  architecture,  in  conjunction  with  mechanisms  to  support  the 
kinds  of  processing  performed  by  humans  on  spatial  information.  In  general,  this 
proposal  is  in  line  with  the  existing  architecture,  and  with  existing  practice  within  the 
ACT-R  community.  One  exception  to  this  is  an  explicit  proposal  for  direct 
communication  between  modules.  Although  such  connections  do  not  exist  in  the 
architecture  currently,  there  is  a  recognition  that  they  are  likely  to  exist,  based  both  on 
human  performance  and  neuroanatomy  (Anderson,  personal  communication).  We 
propose  a  close  link  between  the  new  spatial  module  and  other  modules  in  the 
architecture,  particularly  the  vision  and  motor  modules.  Overall,  we  have  taken  care 
to  ensure  that  the  proposal  is  consistent,  both  internally  and  with  ACT-R.  Thus,  we 
are  confident  that  the  emerging  account  provides  a  useful  conceptualization  of  how 
humans  encode,  store,  and  process  spatial  information. 

3.1  Enhanced  Visual  Representation 

The  existing  representation  of  visual  information  in  ACT-R  is  based  substantially  on 
the  EPIC  architecture  [48].  It  represents  visual  information  by  splitting  object 
information  from  location  information,  following  the  research  of  Ungerleider  & 
Mishkin  [52].  However,  these  representations  are  impoverished,  due  to  both  historical 
and  technical  reasons.  Cognitive  architectures  certainly  have  not  solved  the  vision 
problem,  nor  does  our  theory.  However,  we  propose  to  augment  the  existing 
representation  of  visual  information,  specifically  location  information,  to  provide  a 
more  psychologically  valid  representation  that  is  able  to  support  spatial  operations. 

The  basic  functioning  of  the  vision  module  in  ACT-R  is  that  the  contents  of  the 
screen  are  processed  into  the  visual  icon,  which  is  a  transient  representation  in  a 
retinotopic  frame  of  reference  (actually,  locations  are  based  on  screen  coordinates  out 
of  convenience),  which  is  similar  to  a  feature  map  [53].  Although  the  ACT-R  visual 
icon  is  not  depictive  in  the  sense  that  Kosslyn’s  [22]  visual  buffer  is,  we  propose  that 
the  icon  serves  similar  functions  with  regard  to  the  construction  and  use  of  visual 
imagery  (Section  3.2).  Shifts  of  attention  in  ACT-R  occur  when  a  production  includes 
a  request  for  an  attention  shift,  which  specifies  constraints  on  where  attention  should 
go.  These  constraints  can  be  based  on  location  (e.g.,  to  the  right  of  where  attention  is 
currently),  and  features  of  the  objects  displayed  (e.g.,  only  blue  objects).  The 
constraints  are  compared  to  the  information  available  in  the  icon,  and  the  items  that 
match  those  constraints  are  identified.  One  of  the  items  matching  the  request  is 
selected  (randomly  if  there  are  multiple  items  that  match),  and  attention  is  shifted  to 
the  new  location.  Once  attention  ’’arrives”  the  visual  buffers  are  populated  with 
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chunks  representing  information  about  the  object.  The  timing  of  these  operations  is 
based  on  a  vast  psychophysical  literature. 

Egocentric  Buffer.  The  first  step  in  augmenting  the  representation  of  location  in 
ACT-R  consists  of  adding  a  buffer  to  hold  a  representation  of  the  egocentric  location 
of  the  object  (the  egocentric  buffer  in  Figure  1).  Currently,  ACT-R’s  visual-location 
buffer  holds  the  location  of  the  object  using  screen-based  coordinates.  This  buffer 
includes  other  featural  information  about  the  object,  including  its  size,  color,  and  type 
(e.g.,  text  versus  button),  which  corresponds  roughly  to  the  information  available  in 
the  visual  icon.  In  practice,  this  representation  is  used  primarily  to  support  visual 
search,  but  it  also  supports  processing  of  2D  displays,  as  are  commonly  used  in 
psychological  experiments. 

What  the  existing  representation  does  not  support  is  encoding  location  in  3D  space. 
Flistorically,  this  has  not  been  problematic,  since  ACT-R  (like  other  cognitive 
architectures)  generally  has  not  been  applied  to  tasks  involving  complex,  3D 
environments.  This  kind  of  task  environment,  however,  is  becoming  increasingly 
common  as  the  applications  of  work  on  computational  cognitive  modeling  continue  to 
expand  (e.g.,  [54],  [55],  [56]).  To  address  this  shortcoming,  we  propose  adding  an 
egocentric  buffer  to  hold  3D  spatial  information.  Information  encoded  in  this  buffer 
includes  the  distance  of  an  object,  as  well  as  its  bearing,  relative  to  the  location  of 
ACT-R  in  the  environment.  It  also  includes  an  estimate  of  the  absolute  size  of  the 
object  (i.e.,  not  retinal  size),  as  well  as  the  orientation  of  the  object  and  information 
about  motion  (speed  and  direction).  Like  existing  buffers  in  the  vision  module,  this 
information  is  encoded  and  updated  when  visual  attention  is  shifted  to  a  particular 
object.  Note  that  the  existing  visual-location  buffer  remains  essential.  We  believe  that 
visual  information  represented  at  the  level  of  features  in  a  retinotopic  frame  of 
reference  is  necessary  in  the  control  of  visual  attention. 

Environmental  Erame  of  Reference  Buffer.  As  important  as  an  egocentric  encoding 
of  location  is  for  immediate  action  and  processing,  it  does  not  provide  any 
information  about  the  location  of  an  object  relative  to  other  objects  in  the  world.  In 
Section  1.1,  we  indicated  that  representing  location  information  utilizes  multiple 
frames  of  reference.  One  of  these  is  based  upon  the  surrounding  environment.  In 
virtually  any  space,  there  are  distinct  features  that  provide  a  frame  of  reference  for 
encoding  relative  locations  of  objects.  This  may  be  a  landmark,  like  the  Eiffel  Tower 
in  Paris,  or  geographic  feature,  like  the  Pacific  Ocean  on  the  California  coast  in  the 
United  States.  We  propose  that  the  human  visual  system  takes  advantage  of  these 
features  to  provide  a  stable  frame  of  reference  for  encoding  object  location.  An 
interesting  question  exists  regarding  how  a  particular  reference  frame  is  selected 
within  an  environment  when  multiple  options  are  generally  available.  We  believe  that 
salience  plays  a  key  role  in  this  process.  However,  we  suspect  that  there  are  large 
individual  differences  in  this  process,  which  may  contribute  to  performance 
differences  in  orientation  and  navigation  tasks  [57],  [58]. 

The  contents  of  the  environmental  buffer  provide  a  basis  for  calculating  spatial 
relationships  among  objects.  Some  proposals  suggest  that  these  quantities  are 
computed  automatically  when  visual  attention  is  shifted  from  one  object  to  another 
(e.g.,  [59]).  In  contrast,  we  believe  that  identifying  the  spatial  relationship  between 
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two  objects  is  an  explicit  process  of  estimation.  However,  if  objects  are  represented  in 
an  exocentric  frame  of  reference,  such  estimates  need  not  be  difficult  to  compute,  and 
can  be  determined  using  immediate  visual  perception  or  from  memory  using  mental 
imagery.  So,  while  you  may  not  have  explicitly  considered  the  distance  between  the 
phone  in  your  office  and  the  door,  you  can  recall  the  locations  of  both  items  (within 
the  reference  frame  of  your  office)  from  memory  and  compute  that  relationship  with 
relative  ease  when  asked.  If  items  are  located  in  different  frames  of  references  (e.g., 
the  stove  in  your  kitchen  and  the  computer  in  your  office),  additional  effort  is  needed 
to  establish  a  common  frame  of  reference,  but  the  process  will  be  similar.  Mental 
imagery,  described  below,  supports  these  operations. 

Episodic  Buffer.  The  final  modification  we  propose  to  the  vision  module  provides  a 
means  for  consolidating  visual  experiences  into  a  unitary  representation.  We 
accomplish  this  by  proposing  an  episodic  buffer,  which  links  the  contents  of  the 
visual  buffers,  and  produces  an  episodic  trace  of  the  experience.  Our  proposal  bears 
significant  resemblance  to  Kosslyn’s  conceptualization  of  the  role  of  the  hippocampus 
in  representing  episodic  information  [22].  Specifically,  we  do  not  propose  that  all  of 
the  information  related  to  a  visual  experience  is  represented  in  this  buffer.  Rather,  we 
propose  that  this  buffer  holds  a  chunk  that  encodes  pointers  to  the  contents  of  the 
other  visual  buffers  (other  chunks). 

The  resulting  vision  module  should  operate  as  follows.  When  attention  is  shifted  to 
a  new  object  in  the  visual  field,  the  vision  module  updates  the  visual-object,  visual- 
location,  environmental,  and  egocentric  buffers  with  chunks  that  represent  the 
information  about  the  object  being  attended.  These  processes  occur  in  parallel, 
through  distinct  mechanisms  in  ACT-R,  and  distinct  cortical  pathways  in  the  brain 
(see  Section  4).  Identifiers  for  those  chunks  are  specified  as  slot  values  in  a  chunk  in 
the  episodic  buffer,  linking  them  together  in  a  single  episodic  representation.  All  of 
these  chunks  are  deposited  into  declarative  memory,  making  the  information 
accessible  at  later  times.  In  addition,  these  chunks  also  are  subject  to  the  same 
activation  learning  and  decay  mechanisms  as  other  chunks  in  memory,  meaning  that 
perceptual  experience  can  be  forgotten  much  like  other  information.  These 
mechanisms  already  exist  in  ACT-R.  The  chunks  stored  in  memory  form  the  basis  for 
mental  imagery,  which  is  discussed  next. 

3.2  Mental  Imagery 

There  is  a  great  deal  of  evidence  suggesting  that  engaging  in  mental  imagery  recruits 
many  of  the  same  areas  of  the  brain  as  visual  perception  (c.f.,  [22]).  Based  on  this 
literature,  we  find  it  appropriate  to  posit  a  close  link  between  visual  perception  and 
imagery.  In  fact,  our  claim,  in  line  with  Kosslyn,  is  that  mental  imagery  does  not 
reflect  a  distinct  and  separable  component  of  human  cognition.  Rather,  in  this 
architecture,  mental  imagery  operates  through  the  mechanisms  associated  with  visual 
perception.  Note  that  in  Figure  1,  there  is  no  imagery  module  and  no  imaginal  buffer, 
in  contrast  with  Anderson  et  al.  [26].  We  achieve  the  functionality  associated  with 
mental  imagery  through  the  interaction  of  the  vision  module  with  a  spatial  module 
that  incorporates  default  features  of  modules  within  ACT-R. 
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Images  are  generated  in  this  architecture  by  retrieving  episodic  perceptual 
experiences  from  memory.  This  process  works  similarly  to  retrieving  declarative 
knowledge  in  ACT-R,  such  that  a  request  is  generated  by  the  central  production 
system  to  retrieve  an  episodic  chunk  from  memory,  which  is  placed  in  the  episodic 
buffer,  with  pointers  to  the  chunks  associated  with  this  memory  from  the  other  visual 
buffers.  These  chunks  can  be  retrieved  based  on  these  references  and  the  information 
can  be  propagated  to  the  visual  icon,  where  it  is  reinstantiated.  This  process  causes 
attention  to  be  “pulled  from”  the  external  environment  and  focused  on  this  internally- 
generated  visual  representation.  The  result  is,  essentially,  a  copy  of  the  original  visual 
experience.  However,  a  variety  of  errors  could  occur  in  this  process,  including  mis- 
retrievals,  which  would  affect  the  characteristics  of  the  mental  image. 

As  the  preceding  description  suggests,  we  posit  that  mental  images  are  represented 
at  the  level  of  the  visual  icon  in  ACT-R.  In  the  current  implementation  of  ACT-R,  this 
is  a  propositional  representation  that  contains  feature-based  information  about  objects, 
including  spatial  location,  color,  and  size  [51].  Because  mental  images  are  effortful  to 
maintain,  we  posit  that  the  visual  icon  has  a  rapid  decay  rate.  It  is  only  by  refreshing 
information  that  it  can  be  maintained.  For  items  in  the  visual  world,  this  is  effortless, 
since  the  electromagnetic  radiation  impinging  on  the  retina  provides  constant  input 
regarding  visual  information  in  the  environment.  In  the  case  of  mental  images, 
however,  attention  is  required  to  maintain  the  image  for  any  significant  length  of  time. 
As  we  implement  this  architecture,  we  will  adapt  ACT-R’ s  existing  declarative 
memory  decay  function  for  use  with  this  component  of  the  system,  with  an 
appropriately  higher  decay  rate. 

Of  course,  mental  images  can  be  modified  and  transformed.  The  mechanisms 
available  for  performing  these  transformations  are  described  next.  We  simply  note 
here  that  the  primitive  transformations  available  for  manipulating  mental  images 
relate  to  slot  values  in  the  chunks  created  during  visual  perception.  Whereas  a  variety 
of  transformations  are  possible,  we  focus  in  the  next  section  on  spatial 
transformations,  which  are  generated  through  changes  to  slots  in  the  chunk  in  the 
egocentric  buffer. 

3.3  A  Specialized  Module  for  Processing  Spatial  (and  Magnitude)  Information 

Spatial  information  processing  is  a  component  of  many  tasks  and  cognitive  activities. 
Thus,  an  account  of  these  abilities  in  humans  must  be  both  general  and  powerful.  The 
modifications  to  the  vision  module  described  in  Section  3.1  are  essential  to  providing 
a  robust  representation  of  object  locations  in  the  environment.  However,  there  are  no 
mechanisms  in  the  vision  module  that  directly  support  spatial  transformations, 
estimations,  or  calculations.  While  our  discussion  centers  around  processing  spatial 
information  obtained  through  visual  perception,  we  accept  that  the  mechanisms  may 
generalize  to  other  concepts,  like  volume  in  auditory  perception,  brightness  in  color 
perception,  or  intensity  in  taste,  smell,  and  touch.  Since  ACT-R  has  only  rudimentary 
auditory  and  vocal  abilities  (and  no  sense  of  taste,  smell,  or  touch),  these  issues,  in 
large  part,  cannot  be  addressed  in  the  current  architecture.  They  do,  however,  offer  an 
interesting  direction  for  future  empirical  and  modeling  research. 

We  propose  that  mechanisms  for  processing  spatial  information  are  instantiated 
within  a  specialized  module  in  ACT-R.  There  are  several  key  components  of  this 
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module,  which  we  will  examine  in  turn.  The  degree  to  which  the  different  capabilities 
are  actually  separable  from  a  cognitive  or  neuropsychological  perspective  needs  to  be 
carefully  evaluated  with  additional  research.  In  this  section,  we  will  attempt  to 
differentiate  them  while  simultaneously  providing  evidence  that  they  are 
interdependent.  These  components  of  spatial  ability  include  spatial  transformations, 
magnitude  estimations,  and  magnitude  computations. 

Spatial  Transformations.  Humans  have  the  ability  to  maintain  and  manipulate 
mental  images  to  perform  a  number  of  tasks  and  activities,  from  mental  rotation  [7], 
to  image  composition  [60],  to  mental  simulation  [61],  [62].  The  ability  to  transform 
images  and  inspect  the  results  is  an  essential  component  of  spatial  reasoning  [49]. 
However,  humans  are  capable  of  many  different  kinds  of  complex  spatial  image 
transformations,  some  of  which  depend  heavily  on  knowledge  and  experience  and/or 
are  specific  to  particular  object  classes  (e.g.,  compressing  an  accordion-like  or  spring¬ 
like  object).  Rather  than  address  this  broader  class  of  complex  transformations,  we 
will  initially  model  a  small  number  of  basic,  but  very  frequently  used  transformations. 

Perhaps  the  best-studied  transformation  to  discuss  is  mental  rotation.  Although 
early  research  suggested  that  whole  objects  might  be  mentally  rotated  through 
intermediate  states  in  a  depictive  representation  [7],  subsequent  research  argues  for  a 
more  flexible  process  that  can  involve  focusing  on  individual  object  parts,  increasing 
their  imagined  size  if  necessary  to  make  a  fine  discrimination,  and  changing  their 
visualized  position  and  orientation  [6],  [22].  Therefore  we  do  not  plan  to  model  basic 
transformations  such  as  size,  position  and  orientation  by  directly  moving  the 
constituent  points  of  an  object  across  the  visual  icon.  Rather,  production  rules  will 
select  a  relevant  object  or  object  part  as  the  focus  of  attention,  resulting  in  its  position 
being  represented  in  the  egocentric  buffer.  The  production  system  will  also  select 
goal-relevant  image  transformation  processes  to  alter  the  relevant  slot  values  of  the 
selected  object/part.  These  transformations  include  translations,  zooming,  and  mental 
rotation,  which  can  be  achieved  by  manipulating  an  image’s  distance  and/or  bearing, 
size,  and  orientation  in  the  egocentric  buffer,  respectively.  The  vision  module,  using 
direct  module-to-module  links,  recruits  the  spatial  module  to  perform  the  operations. 

The  role  of  the  spatial  module  in  this  case  is  to  perform  the  requested 
transformations,  producing  alterations  to  the  representation  of  the  object  in  the  visual 
icon.  In  many  cases,  this  will  be  a  complex,  iterative  process  involving  several  objects 
or  parts  at  different  scales  and  in  different  locations.  Often  the  next  transformation 
subgoal  will  be  determined  only  after  the  system  inspects  the  results  of  the  previous 
transformation,  so  image  inspection  processes  will  go  hand-in-hand  with  spatial 
transformations.  In  addition,  the  decay  properties  of  mental  images,  and  perceptual 
refresh  rates  of  visual  stimuli  will  impact  the  size  of  the  subgoals  and  other  aspects  of 
these  transformation  mechanisms.  ACT-R  already  contains  processes  which  control 
the  inspection  of  simple  visual  information  via  the  allocation  of  attention  to  locations 
and  features  represented  in  the  icon. 

Although  the  number  of  simple  transformations  that  we  will  model  is  small,  they 
can  be  combined  using  the  process  just  described  to  create  complex  manipulations  of 
mental  images  in  the  service  of  spatial  cognition.  An  example  of  the  usefulness  of 
such  transformations  can  be  found  in  an  analysis  of  the  performance  of  expert 
meteorologists  (e.g.,  [62],  [63]).  The  comments  of  meteorologists  upon  viewing  a 
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static  display  of  weather  patterns  give  evidence  that  they  are  generating  weather 
predictions  by  imagining  a  complex  series  of  transformations  to  the  size,  location,  and 
orientation  of  regions  affected  by  various  meteorological  events. 

Magnitude  Estimations.  Just  as  the  transformation  component  of  the  spatial  module 
is  closely  linked  with  manipulating  mental  images,  magnitude  estimations  are 
associated  with  encoding  information  using  vision  or  mental  imagery.  According  to 
Klatzky  [33],  estimates  of  egocentric  distance  and  bearing  are  primitive  values  in 
egocentric  frames  of  reference.  In  addition,  however,  humans  are  able  to  estimate 
these  relations  between  arbitrary  pairs  of  objects  in  the  environment,  which  can  be 
useful  for  many  purposes,  including  navigation  [64].  Of  course,  there  is  bias  and  error 
associated  with  these  operations,  but  people  are  still  able  to  achieve  a  relatively  high 
degree  of  accuracy.  These  processes  also  appear  to  be  involved  in  planning  and 
executing  motor  movements,  like  reaching  to  grab  a  coffee  cup  on  the  desk  in  front  of 
you  (e.g.,  [65]).  Estimating  magnitudes  is  a  basic  function  of  the  spatial  module,  and 
is  another  point  at  which  we  posit  significant  module-to-module  communication.  In 
reaching  to  pick  up  a  coffee  cup,  for  example,  detailed  spatial  information  is 
necessary  to  plan  and  execute  the  appropriate  motor  actions  to  grasp  the  cup. 
Moreover,  people  perform  these  actions  precisely,  without  conscious  awareness  of  the 
spatial  information  that  is  influencing  their  motor  movements  [66].  Thus,  we  believe 
that  these  interactions  occur  through  cortical  pathways  outside  the  main  production 
cycle  in  ACT-R. 

Under  this  perspective,  the  production  system  of  ACT-R  is  responsible  for 
formulating  the  high-level  action,  like  “pick  up  the  coffee  cup.”  The  motor  module  is 
then  responsible  for  determining  how  to  perform  that  action,  which  involves 
interaction  with  the  spatial  module  to  plan  the  details  of  the  motor  movements. 
Research  by  Brooks  [67]  suggests  that  spatial  information  processing  is  required  to 
plan  motor  movements.  It  also  suggests  that  there  is  an  overlap  between  these 
mechanisms  and  the  mechanisms  required  to  generate,  maintain,  and  inspect  mental 
images,  which  were  described  above. 

We  believe  that  magnitude  estimation  is  utilized  by  the  vision  module  as  well, 
when  a  new  item  is  attended  in  the  environment.  When  a  shift  of  attention  occurs, 
information  about  the  distance  and  bearing  of  the  object  with  respect  to  ACT-R’s 
location  in  the  environment  must  be  computed.  We  propose  that  these  operations 
recruit  the  spatial  module  as  well.  It  also  may  be  the  case  that  this  component  of  the 
spatial  module  is  involved  in  planning  and  executing  eye  movements  to  bring  new 
items  into  the  focus  of  attention  (e.g.,  [68]).  As  noted  above,  these  mechanisms  may 
be  applicable  more  broadly  for  computing  magnitudes  other  than  spatial  quantities. 
However,  consideration  of  those  possibilities  is  beyond  the  scope  of  this  proposal. 

Lastly,  the  mechanisms  of  estimation  can  be  engaged  by  the  production  system, 
through  the  spatial  buffer.  An  explicit  attempt  to  estimate  a  distance  or  bearing  from 
one  object  to  another  in  the  environment  would  be  an  example  of  how  central 
cognition  may  utilize  these  mechanisms.  Such  a  request  would  result  in  a  chunk, 
returned  into  the  spatial  buffer,  which  identifies  the  objects  and  the  relationship 
between  them.  Such  explicit  requests  form  the  basis  of  generating  a  cognitive  map 
within  this  architecture.  This  set  of  mechanisms  can  also  compute  qualitative 
estimates  of  magnitudes,  like  close,  above,  small,  and  far.  Some  research  has 
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suggested  that  qualitative  (categorical)  operations  like  this  are  localized  in  the  left 
hemisphere,  while  quantitative  (continuous)  operations  (e.g.,  distance/bearing 
estimates)  are  performed  in  the  right  hemisphere  [69],  [70]. 

Magnitude  Computations.  In  some  circumstances,  like  planning  an  attention  shift  or 
a  motor  movement,  estimates  of  magnitudes  can  be  useful  in  isolation.  However, 
some  of  the  most  important  functions  of  spatial  cognition  involve  performing 
computations  involving  multiple  magnitudes.  There  are  a  variety  of  computations  that 
may  be  performed  on  magnitude  information,  including  qualitative  comparisons  (e.g., 
<,  >,  =)  and  quantitative  operations  (e.g.,  +,  -,  /,  *).  Again,  these  different  types  of 
operations  may  be  performed  in  different  hemispheres  in  humans  (c.f.  [69],  [70]). 
This  is  a  sophisticated,  and  potentially  extensive,  set  of  operations  to  be  performed  on 
quantitative  information. 

We  propose  that  these  functions  are  computed  using  another  set  of  mechanisms 
within  the  spatial  module.  There  is  also  evidence  that  these  operations  are  conducted 
on  abstract  representations  of  magnitude,  rather  than  using  information  embedded  in 
vision  (e.g.,  distance  and  bearing)  or  other  modality.  Neuropsychological  research  has 
shown  that  the  angular  gyrus,  in  the  posterior  inferior  parietal  lobule,  is  implicated  in 
the  processing  of  spatial  and  numerical  information  (e.g.,  [71],  [72]).  We  take  these 
findings  to  suggest  that  quantitative  information  of  this  sort  is  represented  in  a 
common  format  for  performing  computations  like  those  mentioned  above.  Thus,  we 
propose  that  the  outputs  of  estimation  processes  are  in  an  abstract,  propositional  form. 
Comparisons  and  computations,  then,  are  performed  on  this  abstract  representation. 
These  requests  are  generated  through  central  cognition,  utilizing  the  spatial  buffer 
mentioned  above. 


4  Spatial  Competence  in  the  Brain 

Thus  far,  the  discussion  of  the  architecture  for  spatial  competence  has  centered  around 
the  structure  and  mechanisms  required  to  support  spatial  information  processing 
within  ACT-R.  In  this  section,  we  present  some  information  regarding  the  mapping  of 
those  structures  and  mechanisms  to  particular  brain  areas.  Neuropsychological 
evidence  concerning  spatial  abilities  in  humans  is  extensive.  It  has  been  shown  that 
the  parietal  lobe  is  critical  in  processing  spatial  information,  and  a  very  large  number 
of  studies  have  attributed  particular  aspects  of  spatial  cognition  to  particular  portions 
of  the  parietal  lobe  and  other  portions  of  the  cortex  (c.f.,  [73]). 

A  comprehensive  review  of  the  neuropsychological  evidence  concerning  spatial 
cognition  is  not  presented  here.  What  we  do  provide  is  an  overview  of  the  mapping  of 
the  spatial  competence  architecture  to  brain  regions  without  considering  the  mapping 
of  other  components  of  ACT-R  to  the  brain,  which  has  been  addressed  elsewhere 
(e.g.,  [26]).  Along  the  way,  we  cite  important  research  that  supports  our  position,  but 
generally  do  not  take  time  to  examine  all  the  perspectives.  In  addition,  area 
delineations  and  hypothesized  locations  should  be  considered  as  approximate.  There 
is  a  great  deal  of  complexity  in  the  human  cortex,  and  we  do  not  wish  to  suggest  that 
cognitive  functions  are  exclusively  localized  in  the  regions  we  suggest,  nor  do  we 
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believe  necessarily  that  these  are  the  only  functions  performed  hy  the  various 
locations. 

Figure  1  above  contains  our  hypothesized  assignment  of  components  of  our 
proposal  to  brain  regions.  We  have  placed  the  egocentric  buffer  in  the  posterior 
parietal  lobe,  within  the  superior  parietal  lohule.  This  follows  research  that  has 
identified  a  distinction  between  a  ventral  where  or  action  stream  and  a  dorsal  what 
stream  [52],  [66].  We  view  the  egocentric  buffer  as  representing  the  output  of  the 
ventral  stream.  Next,  the  environmental  buffer,  which  encodes  object  location  with 
respect  to  an  exocentric  frame  of  reference,  is  in  the  inferior  portion  of  the  lateral 
occipital  gyrus.  This  area  and  nearby  areas  (including  the  parahippocampal  cortex) 
have  been  associated,  variously,  with  acquiring  exocentric  spatial  information  [74], 
representing  the  local  visible  environment  [75],  perceiving  and  encoding  landmarks 
[76],  [29],  encoding  ‘building  stimuli’  [74],  and  encoding  of  ‘large  objects’  [77].  All 
of  these  things  can  be  seen  to  relate  to  identifying  the  location  of  an  object  with 
respect  to  an  exocentric  frame  of  reference  based  on  what  is  visible  in  the  surrounding 
environment. 

We  attribute  to  the  hippocampus  the  role  of  encoding  episodic  information  about 
visual  experience  (although  it  is  plausible,  even  likely,  that  this  incorporates  other 
sensory  modalities  as  well).  This  lines  up  closely  with  the  description  of  the  function 
of  the  hippocampus  given  by  Kosslyn  [22].  He  states,  “...the  hippocampus  may  set  up 
the  neural  equivalent  of  ‘pointers’,  linking  representations  that  are  stored  in  different 
loci...”  (p.  223).  As  noted  earlier,  others  have  posited  other  roles  for  the  hippocampus 
in  spatial  cognition,  particularly  with  regard  to  place  cells  and  the  cognitive  map. 
Space  limitations  here  prevent  us  from  reviewing  and  commenting  on  the  evidence 
relevant  to  this  issue. 

Spatial  operations  take  place  across  the  parietal  lobe,  as  noted  in  Figure  1. 
However,  we  posit  that  the  different  functions  we  have  identified  for  the  spatial 
module  can  be  localized  to  different  parts  of  the  parietal  lobe.  Still,  even  these  more 
specific  references  represent  substantial  abstractions.  For  instance,  the  superior 
parietal  lobule  has  been  associated  with  visuospatial  working  memory  operations 
[78],  [79],  [80],  and  we  relate  this  region  to  the  component  of  the  spatial  module  that 
performs  spatial  transformations.  The  angular  gyrus  is  active  in  spatial  tasks  generally 
[69],  [71]  and  in  tasks  requiring  calculations,  particularly  mathematics,  more 
specifically  [72],  [81].  Thus,  we  associate  the  angular  gyrus  with  performing 
magnitude  computations.  This  conceptualization  of  the  function  of  this  area  actually 
provides  a  unification  across  some  of  the  different  notions  of  the  role  of  this  portion 
of  the  cortex.  Finally,  proposals  for  the  role  of  the  supramarginal  gyrus  include 
directing  spatial  attention  (e.g.,  [68]),  mental  imagery  (e.g.,  [82]),  and  motor 
preparation  [65].  All  of  these  functions  fit  well  with  the  role  attributed  to  this  area  in 
our  account,  which  is  performing  magnitude  estimations.  Additionally,  these 
operations  all  rely  on  a  representation  of  location  (following  [52])  to  support  action 
(as  suggested  by  [66]).  So  once  again,  our  theory  provides  a  potential  unification  for 
seemingly  disparate  results. 

Mental  imagery  is  captured  in  Figure  1  in  the  connections  between  components  of 
the  vision  module  and  the  spatial  module,  which  indicate  processing  links  between 
brain  regions.  We  have  not  yet  associated  all  of  these  links  with  particular  pathways 
in  the  brain,  but  there  is  evidence  for  at  least  some  of  them  (c.f.,  [22]).  This  proposal 
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for  the  generation  and  manipulation  of  visual  mental  images  lines  up  well  with  work 
by  Kosslyn.  Certainly,  many  of  the  details  are  missing  in  the  current  mapping  of 
components  of  this  account  onto  the  brain,  but  the  emerging  view  is  consistent  with 
what  we  know  about  mental  imagery  and  neuropsychology. 

Finally,  the  buffer  for  the  spatial  module  resides  in  the  frontal  cortex.  The  frontal 
cortex  is  associated  with  high-level  planning  and  goal  maintenance  activities. 
Additionally,  dorso-lateral  prefrontal  cortex  (DLPFC)  shows  enhanced  activity  in 
performing  spatial  tasks  [83],  [84].  We  view  this  activity  as  stemming  from  the 
processing  requirements  of  managing  requests  for  spatial  operations  and  harvesting 
the  results  of  that  processing.  This  anatomical  relationship  is  similar  to  the  proposed 
mapping  of  declarative  memory  to  the  brain  in  ACT-R,  where  the  buffer  is  in  ventro¬ 
lateral  prefrontal  cortex  (VLPFC)  and  the  actual  storage  of  declarative  information 
occurs  in  the  temporal  lobe  and  hippocampus  [26]. 

In  summary,  we  have  accomplished  a  tentative  mapping  of  spatial  information 
processing  structures  and  mechanisms  to  brain  areas.  The  selective  review  we  have 
presented  illustrates  that  empirical  and  neuropsychological  evidence  generally 
supports  the  mapping  we  have  developed.  There  is,  as  mentioned,  a  vast 
psychological  literature  relating  to  this  topic,  and  there  are  many  neuropsychological 
phenomena  for  which  this  mapping  does  not  provide  an  account.  As  we  implement 
the  mechanisms,  and  validate  the  performance  of  the  entire  system  against  human 
empirical  and  neuropsychological  data,  we  will  use  key  findings  in  this  literature  to 
refine  the  mechanisms  that  are  implemented  to  account  for  the  processes  that  are 
occurring  in  the  brain  when  humans  perform  spatial  tasks. 


5  Conclusion 

We  have  described  a  set  of  mechanisms  for  human  spatial  competence.  The 
architecture  proposed  is  consonant  with  existing  empirical  and  theoretical  evidence 
regarding  the  capabilities  and  limitations  of  human  spatial  information  processing, 
and  is  also  consistent  with  current  knowledge  about  the  functional  neuroanatomy  of 
the  brain.  In  addition,  our  account  is  integrated  with  the  ACT-R  cognitive 
architecture,  which  is  a  well-validated,  quantitative  theory  of  human  cognition.  As  the 
scope  of  cognitive  architectures  expand,  and  as  processing  limitations  of  computer 
technology  are  overcome,  it  is  critical  that  psychologically  valid  accounts  of  human 
spatial  competence  be  implemented  in  cognitive  architectures.  Incorporating 
mechanisms  for  spatial  competence  will  allow  cognitive  architectures,  like  ACT-R,  to 
provide  quantitative  accounts  of  human  performance  in  a  wider  range  of  task 
environments.  This  will  be  critical  for  achieving  the  goals  of  unified  theories  of 
cognition  [47]. 

On  the  other  hand,  it  is  also  vital  that  theories  of  human  spatial  competence 
incorporate  mechanisms  that  account  for  capacities  and  limitations  in  human 
perceptual,  cognitive,  and  motor  performance.  In  any  task,  it  is  the  interplay  of  the 
entire  system  that  produces  the  behavior  that  can  be  observed.  By  linking  our  account 
to  ACT-R,  we  can  leverage  the  mechanisms  of  a  well-validated  theory  of  the  human 
cognitive  architecture.  Mechanisms  for  spatial  competence  fill  in  a  significant  gap  in 
ACT-R’s  capabilities,  just  as  ACT-R  provides  detailed  mechanisms  for  memory  and 
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performance  that  link  spatial  competence  to  human  cognition  more  broadly.  Of 
course,  the  proposal  we  have  described  in  this  paper  does  not  address  every 
phenomenon  in  the  literature  on  human  spatial  information  processing,  but  it  does 
provide  an  integrated  framework  that  can  be  applied  widely  for  understanding  the 
capacities  and  limitations  of  human  cognition  in  this  area.  As  the  structures  and 
mechanisms  are  implemented,  we  will  focus  on  the  empirical,  theoretical,  and 
neuropsychological  details,  to  ensure  that  our  account  is  psychologically  valid.  For 
example,  perhaps  the  processing  mechanisms  currently  grouped  within  a  single 
“spatial”  module  are  better  conceived  as  a  set  of  2,  or  even  3,  separate  modules  that 
interact  in  spatial  information  processing.  This  has  implications  for  capacities  and 
processes,  and  these  details  will  matter  when  the  architecture  is  utilized  to  provide 
quantitative  accounts  of  human  performance.  This  and  other  issues  will  be  addressed 
as  we  move  forward.  A  critical  point,  however,  is  that  a  computational  model, 
implemented  within  a  cognitive  architecture,  is  vital  for  tackling  these  issues  at  this 
level  of  detail.  Thus,  we  are  enthusiastic  and  optimistic  about  the  potential  for 
generating  a  unified,  comprehensive  account  of  how  humans  encode,  store,  and 
process  spatial  information. 
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